Diffuser: Efficient Transformers with Multi-Hop Attention Diffusion for Long Sequences
نویسندگان
چکیده
Efficient Transformers have been developed for long sequence modeling, due to their subquadratic memory and time complexity. Sparse Transformer is a popular approach improving the efficiency of by restricting self-attention locations specified predefined sparse patterns. However, leveraging sparsity may sacrifice expressiveness compared full-attention, when important token correlations are multiple hops away. To combine advantages both transformer full-attention Transformer, we propose Diffuser, new state-of-the-art efficient Transformer. Diffuser incorporates all interactions within one attention layer while maintaining low computation costs. The key idea expand receptive field using Attention Diffusion, which computes multi-hop based on paths between corresponding disconnected tokens, besides among neighboring tokens. Theoretically, show as universal approximator sequence-to-sequence investigate its ability approximate analyzing graph expander property from spectral perspective. Experimentally, effectiveness with extensive evaluations, including language image Long Range Arena (LRA). Evaluation results that achieves improvements an average 0.94% text classification tasks 2.30% LRA, 1.67x savings benchmarks, demonstrates superior performance in aspects.
منابع مشابه
Ruminating Reader: Reasoning with Gated Multi-Hop Attention
To answer the question in machine comprehension (MC) task, the models need to establish the interaction between the question and the context. To tackle the problem that the single-pass model cannot reflect on and correct its answer, we present Ruminating Reader. Ruminating Reader adds a second pass of attention and a novel information fusion component to the Bi-Directional Attention Flow model ...
متن کاملEfficient multi-hop communications in Bluetooth scatternets
This study proposes an integrated ad hoc routing and time-slot scheduling (IARTSS) scheme to address the problem of ad hoc routing in Bluetooth networks. Our proposed scheme contains four main mechanisms to address the different facets of the problem, namely CompensationBased Time-Slot Assignment (CTSA), Traffic Differentiation Queueing (TDQ), Adaptive Master-Slave Switching (AMSS), and an Enha...
متن کاملEnergy Efficient Data Collection in WSN with Multi-Hop Routing
Wireless Sensor Networks (WSNs) are usually self-organized wireless Networks which consists of a number of smart sensor nodes working together in many applications of tracking & monitoring. The main tasks of these sensor nodes are: firstly, systematic collection of data and secondly, transmits gathered data to a distant base station (BS). Hence network lifetime becomes an important parameter fo...
متن کاملEfficient Feature Tracking for Long Video Sequences
This work is concerned with real-time feature tracking for long video sequences. In order to achieve efficient and robust tracking, we propose two interrelated enhancements to the well-known Shi-TomasiKanade tracker. Our first contribution is the integration of a linear illumination compensation method into the inverse compositional approach for affine motion estimation. The resulting algorithm...
متن کاملEnergy Efficient Reliable Communication for Multi-hop Wireless Networks
Current algorithms for minimum-energy routing in wireless networks typically select minimum-cost multi-hop paths. In scenarios where the transmission power is fixed, each link has the same cost and the minimum-hop path is selected. In situations where the transmission power can be varied with the distance of the link, the link cost is higher for longer hops; the energy-aware routing algorithms ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26502